A Normalization Method for Contextual Data: Experience from a Large-scale Application. Tr-98-02

ثبت نشده
چکیده

close, and ii no classiier consistently outperformed the others over the four learning tasks. The inadequacy of simple accuracies for this problem can be explained by the fact that our data sets are imbalancedd11. In terms of number of false alarms, values obtained from our approach performed very well. In two cases 30 and 15 days, they have lead to the minimum number of false alarms. In the other two cases 45 and 10 days, the numbers of false alarms with the new values were close to the minimum number obtained. It is also interesting to note that the classiier obtained with values from our approach generates fewer false alarms than the one obtained with the manufacturer's formulas in all four tasks. 4 Discussion and conclusion It is important to note that the approach presented in this paper is less sensitive than ANOVA regarding the violation of the following assumptions: i normality of the data, ii equal variances for the diierent groups, and iii independent error components. In fact, the only impact of a violation of these assumptions will be on the convergence rate. When these assumptions do not hold, we are likely to increase the number of errors during pairwise mean comparisons which will result in non-optimal normalization in each iteration of the approach. As a consequence, the overall process will be slower. In this paper, we describe a pre-processing method that cancels the eeects of contextual attributes. This method includes two algorithms: one for contextual analysis and the other for normalization. Evaluating our approach, we h a ve developed classiiers for prediction tasks. Results showed that the number of false alarms would be substantially lower when we used our normalized attributes instead of using the ones obtained from manufacturer-supplied formulas. We believe that our approach has a lot of potential for performing advanced data analysis in context sensitive domains where the class attribute is not known ahead of time. The approach could be applied for timely prediction of failures that in most cases is very expensive to deal with. Finally, our approach can also be used for dimensionality reduction so that data analysis is performed more precisely. Acknowledgments The authors would like to thank Air Canada for providing the data and the very useful feedback on this research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Normalization Method for Contextual Data: Experience from a Large-Scale Application

The paper describes a pre-processing technique to normalize contextually-dependent data before appling Lachine Learning algorithm. Unlike many previous methods, our approach to normalization does not assume that the learning task is a classiication task. We propose a data pre-processing algorithm which modiies the relevant attributes so that the eeects of the contextual attributes on the releva...

متن کامل

A Practical Desalinization Model for Large Scale Application

Salinity of soil and water is the most important agricultural hazard in arid and semi-aridregions. In saline soils, yield production directly influences by soluble salts in the root zone aswell as by shallow water table depth. The first step for reclamation of such soils is reducingsalinity to optimum level by leaching. The objective of this study was to develop a practicalmodel to estimate wat...

متن کامل

تحلیل تصاویر ریزآرایه به منظور تشخیص نوع سرطان سینه

Background: Microarray technology is a powerful tool to study and analyze the behavior of thousands of genes simultaneously. Images of microarray have an important role in the detection and treatment of diseases. The aim of this study is to provide an automatic method for the extraction and analysis of microarray images to detect cancerous diseases. Methods: The proposed system consists of t...

متن کامل

تحلیل تصاویر ریزآرایه به منظور تشخیص نوع سرطان سینه

Background: Microarray technology is a powerful tool to study and analyze the behavior of thousands of genes simultaneously. Images of microarray have an important role in the detection and treatment of diseases. The aim of this study is to provide an automatic method for the extraction and analysis of microarray images to detect cancerous diseases. Methods: The proposed system consists of t...

متن کامل

Comparison of Count Normalization Methods for Statistical Parametric Mapping Analysis Using a Digital Brain Phantom Obtained from Fluorodeoxyglucose-positron Emission Tomography

Objective(s): Alternative normalization methods were proposed to solve the biased information of SPM in the study of neurodegenerative disease. The objective of this study was to determine the most suitable count normalization method for SPM analysis of a neurodegenerative disease based on the results of different count normalization methods applied on a prepared digital phantom similar to one ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998